Testing for a change of the innovation 
distribution in nonparametric autoregression 
— the sequential empirical process approach 

Leonie Selk* and Natalie Neumeyer 
University of Hamburg, Department of Mathematics 
Bundesstrasse 55, 20146 Hamburg, Germany 

January 24, 2012 

Abstract 

We consider a nonparametric autoregression model under conditional heteroscedas- 
ticity with the aim to test whether the innovation distribution changes in time. To this 
end we develop an asymptotic expansion for the sequential empirical process of non- 
parametrically estimated innovations (residuals). We suggest a Kolmogorov-Smirnov 
statistic based on the difference of the estimated innovation distributions built from 
the first [ns\ and the last n — \ ns\ residuals, respectively (0 < s < 1). Weak conver- 
gence of the underlying stochastic process to a Gaussian process is proved under the 
null hypothesis of no change point. The result implies that the test is asymptotically 
distribution-free. Consistency against fixed alternatives is shown. The small sample 
performances of the proposed test is investigated in a simulation study and the test is 
applied to data examples. 
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1 Introduction 

Assume we have observed a time series that can be modelled via an autoregression model, 
possibly with conditional heteroscedasticity. We aim at testing for a change point in the 
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innovation distribution. Tests for change points in the distribution of time series data have 
received a lot of attention in mathematical statistics; see Picard (1985), Giraitis, Leipus & 
Surgailis (1996), Horvath, Kokoszka & Teyssiere (2001), Inoue (2001), Boldin (2002), Lee & 
Na (2004), Huskova, Praskova & Steinebach (2007), Huskova, Kirch, Praskova & Steinebach 
(2008), among others. Recently, an online-monitoring procedure to detect changes in the 
innovation distribution of linear autoregressive models was developed by Hlavka, Huskova, 
Kirch & Meintanis (2012). Those tests have applications in different areas, e.g. finance, 
climate science and medicine. For instance financial time series are tested for changes in the 
volatility or return (see e.g. Andreou & Ghysels (2009)) or, for climate control reasons, the 
annual water flow of rivers are tested for changes (see Huskova & Antoch (2003)). 

Classical tests for change points in the distribution of independent data are often based 
on the difference of empirical distributions of the first [ns\ and the last n— [ns\ observations, 
respectively (0 < s < 1). To derive asymptotic properties of the test sequential empirical 
processes are considered; see Shorack & Wellner (1986, p. 131) and also Csorgo, Horvath & 
Szyszkowicz (1997). Those methods for independent data have been transferred to test for 
change points in the innovation distribution of parametric time series models. Sequential 
empirical processes based on estimated residuals and corresponding change point tests were 
suggested by Bai (1994) for ARMA-models, by Koul (1996) in the context of nonlinear time 
series and by Ling (1998) for nonstationary autoregressive models. Those articles are the 
ones most similar in spirit to the paper at hand. However, we do not assume any parametric 
model for either the autoregression function, nor for the conditonal variance function, but 
use nonparametric kernel estimation methods. The (non-sequential) empirical process of 
residuals in a nonparametric homoscedastic autoregressive time series model was considered 
by Miiller, Schick & Wefelmeyer (2009) who prove an asymptotic expansion. Moreover, 
residual empirical processes play an important role in the test for multiplicative structure in 
a nonparametric heteroscedastic time series regression model by Dette, Pardo-Fernandez & 
Van Keilegom (2009). On the other hand our approach is similar in spirit to Neumeyer & Van 
Keilegom (2009) who consider change point tests for the error distribution in nonparametric 
regression models with independent observations. However, in comparison to the latter three 
articles the methods of proof in the paper at hand require considerably more technical effort 
because both the time series structure of the data and the additional index s G [0, 1] in the 
stochastic process have to be taken into account. 

We prove an asymptotic expansion for the sequential empirical process of residuals and 
prove weak convergence of the scaled and centered process to a Gaussian process. It can 
be seen from those results that the nonparametric estimation of the autoregression and 
variance function decisively changes the asymptotic behaviour in comparison to the case 
where innovations would be known. The asymptotic expansion of the sequential process 
is then used to show that nevertheless the Kolmogorov-Smirnov test for a change point as 
described above is asymptotically distribution-free. As a by-product of our proofs we obtain 
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results on uniform rates of convergence of kernel estimators (see Lemma B.l in the appendix). 
Those are similar in spirit to results derived by Hansen (2008), but in contrast we avoid the 
stationarity assumption. Only some stabilization of the mean of innovation densities is 
needed (see assumption (F')), which allows us to apply the results to prove consistency of 
the test under the existence of a change point. We moreover present a simulation study 
which shows good approximations of the asymptotic level as well as good power properties 
of the test under the example models considered. As data applications we consider two 
financial time series, namely the quarterly GNP of the USA and the S&P 500 index. 

The paper is organized as follows. In section 2 we present the model, the nonparametric 
curve estimators and the stochastic process used for the change point test. In section 3 we 
list technical assumptions and present the asymptotic results for the sequential empirical 
process as well as for the process used for the change point test under the null hypothesis 
of no change point. Asymptotic results under fixed alternatives are presented in section 4. 
Section 5 is concerned with a homoscedastic modification of the model. In section 6 we 
present simulation results and consider the data examples. Section 7 concludes the paper, 
whereas all proofs are given in the appendix. 



2 Model, hypotheses and test statistic 

Let (Xj)j e z be a real valued stochastic process following the heteroscedastic autoregressive 
model of order one, 

(AR) Xj = m{Xj- X ) + a{X j _ l )e jl 

where the innovations Ej, j G Z, are independent with E[sj] = and E[e^] = 1 Vj and 
Ej is independent of the past X}., k < j — 1, Vj. 

Assume we have observed X , . . . , X n and our aim is to test for a change point in the 
innovation distribution. Thus we formulate the null hypothesis as 

H '■ El, • • • , £n ~ F 

(with F unknown) while the fixed alternative has the form 

Hi: 3 6»o G (0,1): Ex, . . . , E\_ n e Q \ ~ F, e^oj+i, • • • , £n ~ F, F^F 

(F, F unknown). Let Ej denote an estimator for the innovation Ej, j G Z, to be defined below. 
We consider a Kolmogorov-Smirnov type test statistic based on the stochastic process 

f n (s, t) = ^ k=1 Wnk ^ =L - J+1 nl (F [ns} (t) - 4*_ M (tj) , se[0,l],teR, (2.1) 
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where the sequential empirical processes are defined as 



|nsj 

j=l l^k=l W nk 



n 

W r 



tnw = £ V n Wnj w m < ty 

i=LnsJ+1 ^k=\ns\+l W nk 

The weights are chosen as w n j = w n (Xj_i) with continuous weight function w n : R — >■ [0, 1] 
such that for some sequences a n — >■ — oo, b n — > oo, 

o, x ^ KAj 

for some fixed k > independent of n. The weights are included in the definition of the 
sequential empirical processes to avoid problems of kernel estimation in areas where only 
few data are available, compare to Miiller, Schick & Wefelmeyer (2009) and Dette, Pardo- 
Fernandez & Van Keilegom (2009). Further let the residuals be defined as 

for kernel regression and variance estimators 

yn K / x-^A x . 

m(z) = V" \ (2.3) 

Ei^NM (^-m(x)) 2 

«t 2 (x) = V C " / , (2.4) 

^-(•" ( V ) 

and <r(x) = (6" 2 (a;)) 1 / 2 . Here denotes a kernel function and c n a positive sequence of 
bandwidths. For the ease of representation we use the same bandwidth c n to estimate m 
and a, though in practice it may be advisable to choose different bandwidths. The asymptotic 
results presented in the paper remain valid when two different bandwidths according to the 
assumptions (C) and (C), respectively, in the next sections are chosen. 

We list model assumptions as well as assumptions on the estimators in the next two 
sections. 



3 Asymptotic results under the null hypothesis 

Throughout this section we make use of the following assumptions. 
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(K) The kernel K is a three times differentiable density with compact support [-C, C] and 
sup ue[ _ CjC] \KM(u)\ < K < oo, y. = 0,1,2,3. Moreover K{C) = K{-C) = K'(C) = 
K'(-C) = and / K{u)udu = 0. 

(C) The sequence of bandwidths c n fulfills 

net (log nf -> 0, ( logra ^ o for all 77 > 0. 

Remark: As can be seen from the proof the first bandwidth condition can be replaced by 
nc n(Qnqn) 8 ill) 2 = O (nc^ (log n ) 8r 9+ 8r s+2r / j _ ^/jgye q n , q°, q[, r q ,r s , r f are defined 
in assumptions (X) and (M) below. The second bandwidth condition is equivalent to 
the existence of some 5 > such that 

(o^Uo, (i^)!^ (31) 

UC ri nc S 



for all rj > 0. The first condition in (3.1) is typical in the context of empirical processes 
of nonparametrically estimated residuals, compare Dette, Pardo-Ferndndez & Van Kei- 
legom (2009) or Neumeyer & Van Keilegom (2009), while the log- factor stems from 
the boundary truncation via the weight function. The second condition in (3.1) arises 
at the very end of the proof of Lemma \B.4 in appendix B due to a 6 -dependent covering 
number. The constant 5 is also used in Lemma B.2\ in appendix B. 

(I) For the interval I n = [a n , b n ] there exists some rj < oo such that {b n — a n ) = 0((logn) r/ ). 
Moreover (f^ K f Xo (x)dx + J^_ K f Xo {x)dx) = o((logn)- 1 ). 



(W) The weight function w n : K — > [0, 1] fulfills (2.2) and is three times differentiable such 
that sup ngN sup^gjj \ wit (x)\ < oo for \x = 1, 2, 3. 

(F) The innovations Ej, j G Z, are identically distributed with distribution function F. 
Their density / is continuously differentiable and sup 4gR \ f(t)t\ < oo as well as 
sup 4eR \f'{t)t 2 \ < oo. 

Remark: Due to the continuity of the density f and the derivative f it follows that 
also sup tm f(t) < oo, sup tm \f'(t)\ < oo and sup tm \f'(t)t\ < oo. 

(E) There exists some 6 > 1 + such that E [\X \ 2b ] < oo and E [\ei\ 2b ] < oo. 

(X) The observations Xj, j G Z, are identically distributed and the process (Xj)j e z is a- 
mixing with exponentially fast decaying mixing-coefficient a(n). 
Their density fx is bounded and four times differentiable with bounded derivatives. 
The density is also bounded away from zero on compact intervals and there exists some 
Tf < oo such that q{ = (inf x6 / B fx {x)Y l = 0({\ogn) r f). 

Remark: Assumptions (F) and (X) imply strong stationarity of the process (Xj)j e %. 
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(Z) It holds that 

sup ((\m(x)\ + \a(x)\) 2k f Xo (x)) =0(1) 



x&J„ 

and there exists some 1 < j* < oo such that 



sup ( (\m(x)\ + \<t(x)\) (\m(x')\ + \a(x')\) f Xo ,Xj-xi x i J) J = 0(1) 

-I _i i 

is valid for all j > j*+l, for k = 1, 2, n — > oo with J n = [a n —(C+c n 2 n 2 (logn) 2 )c n , b n + 

(C + C n 2 n 2 (lognjajCnJ. 

(M) The regression function m and the scale function a are four times differentiable and 
there exist some r g ,r s < 00 and q n , q° with q n = 0((logn) r9 ), q° = 0((logn) rs ), 
(q n )- 1 = 0(1), {q a n )~ l = 0(1) such that SUp x6[0B _ OcnAi+Ocn ] \m^(x)\ = 0{q n ), 
swp xe[an _ Ccnfin+Ccn] \<tM(x)\ = 0(q n ), /x = 0,1,2,3,4 and (inW B k(x)l)" 1 = 0(q a n ). 

An example for which the assumptions are fulfilled is the AR(1) model Xj = 0.5Xj_ 1 + ej 
with standard normally distributed innovations Ej, j G Z. Then the observations Xj, j G Z, 
are identically Af(0, §) distributed and with I n = [-(§ log((logn) 2 )) 1/2 -K, (| log((logn) 2 )) 1/2 + 
k], a weight function that fulfills (W), a kernel function that fulfills (K) and a bandwidth 
that fulfills (C) all assumptions are fulfilled. To this end note that exponential a-mixing 
holds for stationary models with lim| a ,|_ s>00 (|m(x)| + |c(x)|-E , [|£j| t ] : f)/|;e| < 1 for some r > 1 
with £[|£j| T ] < 00; see Doukhan (1994). 

In the first theorem we state a stochastic expansion of the residual based sequential 
empirical process. The proof is given in appendix A. 

Theorem 3.1 Under model (AR) with assumptions (K), (C), (I), (W), (F), (E), (X), (Z), 
and (M) we have that under the null hypothesis H of no change point 

-Y^w^ilie^^-Fit)) 

i E a fe £ - + X> + ¥/w*s7 X>, 2 - 1) + m4) 



n ^— ' n n ^— ' n 2n ^— ' J v/n' 

uniformly with respect to s G [0, 1] and tel. 

Remark 3.2 The theorem complements results by Miiller, Schick & Wefelmeyer (2009) 
and Dette, Pardo- Fernandez & Van Keilegom (2009). In both articles only non-sequential 
processes are considered (i.e. the case s = 1). While Miiller, Schick & Wefelmeyer (2009) 
consider a homoscedastic version of model (AR) (er = const, see also section[5]), Dette, Pardo- 
Fernandez & Van Keilegom (2009) consider a heteroscedastic autoregression/regression model 
and a result similar to Theorem 3.1 (for s = 1) can be derived from their proofs. The se- 
quential process (s G [0, 1]) though requires much more involved methods of proof that also 
result in slightly more complicated assumptions. ■ 
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From the stochastic expansion weak convergence of the sequential residual process can 
be derived. The proof of Corollary 3.3 is given in appendix A. 



Corollary 3.3 Under the assumptions of Theorem 3A_ under the null hypothesis H of no 
change point the process 

v[nsj 



n 



^ k=1 n Wnk (hnsi (t) - F{tj) , 8 e [0, l], t e R, 



converges weakly to a centered Gaussian process (Kp(s, i)) s e[o,i],*eK wiih 
Cov(K F (si,ti),K F (s 2 ,t 2 )) = s 1 As 2 (F(t l At 2 )-F(t 1 )F(t 2 )) 

+ s lS2 (fih) (E[e 1 I{e 1 < t 2 }} + hEKej - < t 2 }}) 

+f(t 2 ) (Efalfa <h}} + t 2 E[{e\ - l)I{e 1 < h}]) 
+f(t l )f{t 2 ) (1 + (ti + t 2 )E[e\] + t x t 2 {E[e^ - 1)) 



Remark 3.4 From Theorem 3^ and Corollary 3^ it can be seen that the nonparamet- 
ric estimation of the autoregression and conditional variance function vastly influences the 
asymptotic behaviour of the process. The asymptotic distribution of the partial sum pro- 
cesses decicively changes when based on residuals compared to the corresponding processes 
built from iid innovations. This is different from simpler situations in specific parametric 
time series models, see Bai (1994) and KreiB (1991), among others, but corresponds to sit- 
uations in parametric as well as nonparametric regression models, see e. g. Koul (2002) and 
Neumeyer Sz Van Keilegom (2009). Note however that neither the chosen kernel function 
nor the bandwidth have any influence on the asymptotic distribution. ■ 



The stochastic expansion given in Theorem 3.1 can be used to derive the asymptotic 



distribution of the change point test. First we state weak convergence of the process defined 



in (2.1). To this end in the following let (G(s, -z)) s e[o,i],ze[o,i] denote a completely tucked 



Brownian sheet, i. e. a centered Gaussian process with covariance structure 

Cov(G(si, zi), G(s 2 ,z 2 )) = (si A s 2 - s 1 s 2 )(zi A z 2 — Z\Z 2 ). 

Theorem 3.5 Under model (AR) with the assumptions (K), (C), (I), (W), (F), (E), (X), 
(Z), and (M) under the null hypothesis H of no change point there exist Gaussian processes 
(G n (s, i r (i)))ae[o,i],teKj n G N, with the same distribution as (G(s, F(t))) se [ 0j i] i t e R such that 



sup 

se[0,l],<6M 



f n (s,t)-G n (s,F(t)) 



o P (l) 



The proof is again given in appendix A as well as the proof of the next corollary in which 
we state the asymptotic distribution of the change point test. 
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Corollary 3.6 Under the assumptions of Theorem 5\5 under the null hypothesis Hq of no 
change point the Kolmogorov-Smirnov type test statistic sup sg j ^ teR \T n (s, t)\ converges in 
distribution to sup s6 r 01 i z6 r 01 i |G(s, z)|. 



Remark 3.7 From Corollary 3.6 it follows that the test is asymptotically distribution-free 



although the stochastic expansion given in Theorem 3.1 still depends on the innovation 
distribution in a complicated way. This remarkable feature in the context of procedures 
based on nonparametric residual empirical processes was already observed by Neumeyer & 
Van Keilegom (2009) in the context of independent observations. The critical values for the 
test are tabled in Picard (1985). ■ 



4 Asymptotic results under fixed alternatives 

The assumptions (K) and (M) as well as the following assumptions are used to proof consis- 
tency of the test under fixed alternatives. 

(C) The sequence of bandwidths c n fulfills 

ncl ->■ 0, ( logn ]! for all r] > 0. 

(I') For the interval I n = [a n , b n ] there exists some rj < oo such that (b n —a n ) = 0((logn) ri ). 



(W) The weight function w n : R — > [0, 1] is continuous and fulfills (2.2). 



(F') Let Sj have distribution function F Ej and density f Ej , j G Z. Let - Y^ij=i su PteR fej{t) — 
0(1) as well as ± J2]=i su PteiR 1/^(0*1 = for n ~> 00 • 

Remark: Under the alternative Hi the assumption is fulfilled when sup teR f(t) < oo, 
sup t6R /(t) < oo ; sup tgR \f(t)t\ < oo, and sup t6K |/(t)t| < oo ; where f and f are 
densities corresponding to F and F, respectively. 

(E') It holds that ± J27=i E [\ x i\ 2b ] = 0(1) for some b > 1 + y/3, n -> oo. 

(X') The observation process (Xj)j^z is a-mixing with mixing-coefficient a(n) = 0(n _/3 ) 
for some 

\ (l + V3)&-2(2 + V3)' 

The observation densities fx t are four times differentiable and fulfill 
sup xgM n~ l YH=i \fx_ ( x )\ = 0(1), /j, = 0, 1,2,3,4. Moreover there exists some rj < oo 
such that qf = (inCl ~ n £?=i /^(x))" 1 = 0((logn) r ')- 



(Z') For all m n < n with m ra 1 = o(l) it holds that 

sup xeJ „ ((|m(x)| + ^ max < s < n _ ffl „ E£s+i 



SU PxGJ„ ((M S )I + k(^)l) 2fc i maX 0<5< ? 

and there exists some 1 < j* < oo such that 
/ 

\k n //Mil / /M\fc 1 



i=5+l 



l+iSft])*- 1 /^)) =0(1) 



sup 

X .X ^.Jj\ 



and 



5+m„ 



(|m(x)i + |a(x)|) fc (|m(x / )| + Kar , )|)' 



max 



V 



mi 0<S<n-m„ 

n i,j=S+l 



£ / 

i=S+l 
\i-j\>3* 



X. 



i— 1>-Xj'— 1 



\ 

(x, a/) 



0(1) 



sup 



(\m(x)\ + \o~(x)\) 



i+j* 

max > 

j*+l<i<n-j* 

j=i-j" 



0(1) 



for k — 1, 2, n -)■ oo with J n = [a n -(0+c n 2 n 2 (logn)a)c n , 6„+(0+c n 2 n 2 (logn)3)c n ]. 



Remark: It suffices when the assumption is valid for m n defined in (B.4) in the proof. 



Under the alternative Hi the arithmetic mean of all fx^ converges to the weighted 
sum of the observation density before the change point and the long range observation 
density after the change point with weights 6q and 1 — 60, so (Z 7 ) is fulfilled if (Z) and 
(Z) with the long range observation density instead of fx are fulfilled and the last part 
of (Z 7 ) holds. 

Remark 4.1 If the observations and the innovations are identically distributed it holds that 
the second and third part of (X') are equivalent to the second and third part of (X) and (Z') 
is equivalent to (Z). The other assumptions are not equivalent, even if the the innovations 
are identically distributed. In detail it holds that assumption (I') is weaker than (I), (F') is 
weaker than (F), (E') is weaker than (E), the first part of (X') is weaker than the first part 
of (X), as well as (C) is weaker than (C). ■ 

Remark 4.2 Note that under the alternative the process (Xj)j e % is not stationary. Thus 
to obtain consistency most auxiliary results in appendix B are proved without assuming 
stationarity. A stabilisation of the density averages as in assumptions (F'), (X') is sufficient 
for our results to hold. In particular we generalize some of Hansen's (2008) results in Lemma 

MM ■ 

Theorem 4.3 Under the assumptions (K), (C 7 ), (I 7 ), (W 7 ), (F% (E% (X 7 ), (Z 7 ), and (M) 
under the fixed alternative Hi with a change point in \n9o\ , we have 



sup 



sup 



n 



(P lneol {t)-F(t) 



o P (l 



En 
fc=|n0 o J+l Wnk 



n 



F n-lnd \(t) - F (t) 



o P (l). 
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Corollary 4.4 Under the assumptions of Theorem \4-^ the Kolmogorov-Smirnov type test 
based on the process T n is consistent against fixed alternatives H\ . 



The proofs of Theorem 4.3 and Corollary |4.4| are given in appendix A. 



5 The homoscedastic AR-model 

In this section we consider a homoscedastic AR-model 

(AR1) X,j = m{X^i) + ej, 

where the innovations Ej, j G Z, are independent with E[ej] = and E[e?] < oo Vj 
and Ej is independent of the past Xk, k < j — 1, V?. 

Our aim is to test the change point hypotheses Hq vs. H\ from section [2j Note that here 
under Hi the change in the innovation distribution can result from a change in the variance. 
The residuals are now defined as Ej = Xj — m(Xj_i) and the test statistic is built with these 



in the same way as described for the heteroscedastic case; see (2.1). Let assumptions (Z), 
(M) under the null hypothesis and assumption (Z") under the alternative be formulated as 
(Z), (M) in section [3] and (Z') in section |1| respectively, but replacing the variance function 
a by a constant. Let (F) be formulated as (F), but replacing conditions sup tgR \ f{t)t\ < 00, 
sup tgR \f'{t)t 2 \ < 00 by sup ieR f(t) < 00, sup fgR \f'(t)t\ < 00. Let (F" ) be formulated as (F'), 
but deleting the last condition. Let (E) and (E") be formulated as (E) and (E') respectively, 
but replacing 2b by b. Then the following asymptotic results are valid. 

Theorem 5.1 Under model (AR1) with assumptions (K), (C), (I), (W), (F), (E), (X), 
(Z), and (M) we have that under the null hypothesis H of no change point 

, l ns \ l ns \ r 1 1 n , 

-£^(/{^<*}-F(t)) = - Y,(n£i<t}-m) + [ ^f(t)-Y, £ i+°rh 



n — ' n c — ' n n — ' \/n 

j=i j=i j=i 



uniformly with respect to s G [0, 1] and t G 



Corollary 5.2 Under the assumptions of Theorem 3A_ under the null hypothesis Hq of no 
change point the process 

^ k=1 n Wnk [F lnsi (t) - F(tj) j , 8 G [0, 1], t G R, 

converges weakly to a centered Gaussian process (K.p(s,t)) ae [o t i] t teM. ^th 

Cov(K F (s 1 ,t 1 ),K F (s 2 ,t 2 )) = s l As 2 (F(t l At 2 )-F(t 1 )F(t 2 )) 

+ s 1 s 2 (f(t 1 )E[e 1 I{e 1 < t 2 }} + f{t 2 )E[e l I{e l < h}} + /(ti)/(t 2 )Var( ei )) . 
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Theorem 5.3 Under model (AR1) with assumptions (K), (C), (V), (W), (F"), (E"), (X'), 
(Z"), and (M) the Kolmogorov-Smirnov type test based on the process T n is consistent against 
fixed alternatives Hi. 

The proofs are analogous to the proofs of the results in sections [2] and |4j but easier due 
to the simpler structure of the model and the process. They are omitted for the sake of 
brevity. 

6 Small sample performance 
6.1 Simulations 

The heteroscedastic model. To examine the performance of the test on small samples 
we considered AR(1) models and ARCH(l) models. 
For the AR(1) case we considered the models 

Xj = 0.5 • Xj-x + £j, e x , . . .,£[§] ~ A/"(0, 1), e L »j+i, . . . ,e n ~ Fx (respectively F 2 ), 

where Fx is the distribution function of a random variable that is M{—2Q, 1) distributed 
with probability 0.5 and M{2C > , 1) distributed with probability 0.5 and F 2 is the distribution 
function of a random variable that is Af(0, (1 — () 2 ) distributed with probability 0.5 and 
A/"(0, 2 — (1 — () 2 ) distributed with probability 0.5, for different values of (. Though the 
data are generated by a homoscedastic model we assume for the data analysis validity of the 
heteroscedastic model (AR). 

In Table [T] the rejection probabilities for 500 repetitions, level 5% and sample sizes n E 
{100, 200} are shown. They are also shown in the left panel of Figure [I] It can be seen that 
the level is approximated well and the power increases for increasing parameter ( as well as 
for increasing sample size n. 



% 


C = o 


C = o.i 


C = 0.2 


C = 0.3 


C = 0.4 


C = 0.6 


C = 0.8 


C = i 


Fx, n = 100 


5 


6.2 


7 


9.2 


10.2 


16.4 


29 


41 


Fx, n = 200 


5.8 


9.8 


10.6 


12.4 


16.2 


40.2 


78.2 


92.4 


















C = 0.99 


F 2 , n = 100 


5 


6.8 


6.4 


7 


7.2 


10.8 


18 


28.2 


F 2 , n = 200 


5.8 


8.8 


9.8 


9.6 


11.2 


18.4 


41.6 


63.4 



Table 1: Rejection probabilities obtained from AR(1) models 
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The same kind of change points was examined for ARCH(l) models 



Xj = */0.75 + 0.25-X'?_ 1 -e i , £i,...,q fJ ~JV(0,1), e^j+i, ••• ,£n ~ Fi (respectively F 2 ) 



with rejection probabilities for 500 repetitions and level 5% as displayed in Table [2] and in 
the right panel of Figure [TJ 



% 


C = o 


C = o.i 


C = 0.2 


C = 0.3 


C = 0.4 


C = 0.6 


C = 0.8 


C = i 


F u n = 100 


4.8 


6.4 


6.6 


7.8 


8 


15.6 


29.6 


56.8 


Fi, n = 200 


5 


8.4 


9.2 


11.2 


14.6 


37.2 


55.4 


80 


















C = 0.99 


F 2 , n = 100 


4.8 


6.8 


7.6 


8 


7.4 


10.2 


17.6 


42.2 


F 2 , n = 200 


5 


8.2 


10 


8.6 


11.4 


19.4 


43.2 


78.8 



Table 2: Rejection probabilities obtained from ARCH(l) models 




Figure 1: Rejection probabilities obtained from AR(1) (left) and ARCH(l) (right) models 
for n = 100 (dashed curves) and n = 200 (solid curves). The thick curves represent the 
results for the model with F\, the thin curves the results for the the model with F 2 . 



We also considered innovations with Student-t distribution with three degrees of freedom. 
The Student-t distribution has heavier tails than the normal distribution and is therefore 
more appropriate for modeling financial data. Due to the fact that Xax{ej) has to be one for 
all j, the Student-t distribution was standardized. We considered the ARCH(l) models 

Xj = ^0.75 + 0.25X1^ ■ £j , £i, . . . , q fJ ~ St{3), q fJ+1 , . . . , e n ~ St(3 + 10C) 
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% 


c = o 


C = 1 


C = 0.2 


C = 3 


C = 4 


C = 0.6 


£ = 8 


C = 1 


n = 100 


5.4 


6.8 


8.2 


8.4 


10.6 


10 


9.6 


9.4 


n = 200 


6 


9.6 


12.8 


14.6 


17.2 


17 


18.8 


19.8 


n = 500 


6.8 


14.6 


26.6 


28.8 


33 


39.4 


42.8 


46 



Table 3: Rejection probabilities obtained from ARCH(1 ) models with St(3 + 10() distributed 
innovations 

for different values of (. The rejection probabilities for 500 repetitions, level 5% and sample 
sizes n e {100,200,500} are shown in Tableland in Figure [2] 

The asymptotic level is approximated reasonably well and the power increases with in- 
creasing ( as well as with increasing n. Here the increase with ( for small n is not as 
pronounced as for the models considered before, because the difference between the distri- 
bution before and after the change point is for ( = 0.5 just slightly different to that for ( = 1 
because the Student-t distribution converges to the standard normal distribution. 
We also examined the following ARCH(l) models: 

Xj = ^0.75 + 0.25X|_ 1 • sj, £i, . . . , £ L «j ~ St(3), e L « J+1 , . . . , s n ~ F 3 (respectively F 4 ), 

where F 3 is the distribution function of a random variable that is {St(3) — 2Q distributed 
with probability 0.5 and (St(3) + 2() distributed with probability 0.5 and F 4 is the distri- 
bution function of a random variable that is (1 — £) ■ <Si(3) distributed with probability 0.5 
and y/2 — (1 — Q 2 ■ St{3) distributed with probability 0.5, for different values of (. 
The rejection probabilities for 500 repetitions and level 5% are shown in Tableland Figure 

El 



% 


C = o 


C = 0.1 


C = 0.2 


C = 0.3 


C = 0.4 


C = 0.6 


C = 0.8 


C = i 


F 3 , n = 100 


5.4 


7 


7.4 


12.2 


20.2 


32.8 


34 


62.4 


F 3 , n = 200 


6 


9.2 


12.4 


25 


51.2 


82 


78 


84.6 


















C = 0.99 


F 4 , n = 100 


5.4 


8 


8.8 


8.2 


8.4 


11.2 


18.6 


37.8 


F A , n = 200 


6 


8.6 


10.2 


9.2 


11.6 


15 


39.8 


77.4 



Table 4: Rejection probabilities obtained from ARCH(1 ) models 



The models with Student-t distributed innovations with three degrees of freedom do not 
fulfill the moment assumptions, because moments greater than or equal to 3 do not exist, 
but the simulations show that the test works on them just the same. 
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difference zeta 



difference zeta 



Figure 2: Rejection probabilities obtained 
from ARCH(1 ) models with St(3+10() dis- 
tributed innovations for n = 100 (dashed 
curve), n = 200 (solid curve) and n = 500 
(dotted curve). 



Figure 3: Rejection probabilities obtained 
from ARCH(1 ) models for n = 100 (dashed 
curve) andn = 200 (solid curve). The thick 
curves represent the results for the model 
with F 3 , the thin curves the results for the 
the model with F 4 . 



Finally, we considered the skew-normal distribution as innovation distribution. Let F 5 
denote the skew-normal distribution with location parameter 

_ / 27T ((IOC) 2 + (IOC) 4 ) 

V vr 2 + (2vr 2 - 2tt) ■ (IOC) 2 + (?r 2 - 2vr) • (10C) 4 ' 

scale parameter (ir(l + (lOQ 2 ) 1 ^ 2 /(tt + (ir — 2) (IOC) 2 ) 1 / 2 an d shape parameter 10C- We 
considered the AR(1) and ARCH(l) models 

Xj = 0.5 ■ Xj.i + ej, e 1 ,.. .,e L »j ~JV(0,1), e L - J+1 , . . .,e n ~ F 5 

and 

X j = v /0.75 + 0.25X2_ 1 -e„ £i,...,q fJ ~JV(0,1), e LfJ+1 , . . . , e n ~ F 5 

for different values of C- The parameters in the skew-normal distribution were chosen like 
this to guarantee E[ej] = and Var(ej) = 1 for all j. The rejection probabilities for 500 
repetitions and level 5% are shown in Table [5] and Figure |4} 

The homoscedastic model. For the homoscedastic model Xj = m(Xj_i) + ej as 
considered in section [5] only Var(ej) < oo Vj is assumed so that we can simulate a change in 
the variance. To this end we generated data from the AR(1) model 

X 3 = 0.5 • Xj„t + £j , e u . . . , £ L «j ~ Af(0, 0.5 2 ), q fJ+1 , . . . , e n ~ jV(0, (0.5 + C) 2 ) 
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C = o 

s 


C = 0.1 
s 


C = 0.2 


C = 0.3 


C = 0.4 

s v 


C = 0.6 

S 


C = 0.8 


C = 1 

s 


ar fi "i -n — inn 


u 


7 


7 9 


o. u 


1 1 4 

X X . t: 


Q 8 

C/ .o 


19 9 

XZj .Zj 


1 8 


ab/i ~\ n — onn 


5 8 


8 4 


1 9 

X W . Zj 


14 8 

It:, O 


1 5 9 

X tj . Zj 


1 7 9 


90 8 
z^u .o 


91 fi 

_ X . VJ 


A on ~\ n — r.nn 

.TtXiJ 1 1 j 11 — OVJ\J 


7 8 




1 7 2 

JL 1 . 


23 4 


98 8 

Zj(J . (J 


34 8 


36 8 


38 8 


ARCH(l), n = 100 


4.8 


7.6 


8 


10 


11.8 


11 


12 


12.2 


ARCH(l), n = 200 


5 


7.8 


11.8 


12 


15.2 


18.8 


20.6 


18.6 


ARCH(l), n = 500 


7.4 


9 


14.8 


21.8 


30 


35.4 


36.6 


38 



Table 5: Rejection probabilities obtained from AR(1 ) and ARCH(1 ) models with skew- 
normal distributed innovations 




Figure 4: Rejection probabilities obtained from AR(1) (left) and ARCH(l) (right) models 
with skew-normal distributed innovations for n = 100 (dashed curve), n = 200 (solid curve) 
and n = 500 (dotted curve). 



for different values of (. 

The rejection probabilities for 500 repetitions and level 5% are shown in Table [6] and in 
Figure [5] 



% 


C = o 


C = o.i 


C = 0.2 


C = 0.3 


C = 0.4 


C = 0.6 


C = 0.8 


C = i 


n = 100 


4.6 


6.8 


8.6 


9.8 


14.4 


20 


29.4 


39 


n = 200 


5.6 


9 


17.2 


25.2 


40.2 


68.2 


85.2 


95.2 



Table 6: Rejection probabilities obtained from homoscedastic AR(1) models with change in 
variance 
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Figure 5: Rejection probabilities obtained from homoscedastic AR(1) models with change 
in variance for n = 100 (dashed curve) and n = 200 (solid curve). 



It can be seen that the theoretical results are supported by the simulations. 



Simulation setting. For each simulation 10 • n observations Xj were generated, 9.5 • n 
with distribution before and 0.5 -n with distribution after the change point. For the test the 
last n observations were used. This was done to ensure that the process is in balance. 
The empirical processes were built without the weight function w n , which means that I n was 
chosen as the real line. This is contrary to the assumptions. Nevertheless the simulations 
support our theoretical results very well, so it can be assumed that the weight function is 
necessary for the theory but the test can be used regardless. 

The Nadaraya- Watson estimators rh and d were calculated with Gaussian kernel and band- 
width c n = n~4. This is also not compatible with all assumptions, e.g. the support of the 
kernel is not compact. However this has negligible effect on the simulations because the 
Gaussian kernel decreases exponentially fast at the tails. The choice of bandwidth is not 
compatible to the assumption as well because it does not converge faster than n~s. A com- 
patible choice would be c n = n~^(\ogn)~ r for some adequate < r < oo, but for the small 
sample sizes that were used the logarithm would be too strong in comparison to n~*, so we 
omitted it. 

To study the influence of the size of bandwidth we simulated the first AR(1) model (with 
Fx) with c n = c ■ n~3 for different values of c G M>o- The results are shown in Figure 6j It 
can be seen that the rejection probability increases with c, especially for ( > 0.5, but also 
the rejection probability under the null hypothesis increases with c. 
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Figure 6: Rejection probabilities obtained form AR(1) models with F± for different sizes of 
bandwidth and n = 200. 



6.2 Real data applications 

We also applied our new test to real datasets. Firstly we examined the quarterly GNP (Gross 
National Product) of the USA in billions of dollars from 1947(1) to 2002(3). The data have 
been seasonally adjusted. We looked at the difference of the logarithm of the GNP, which 
is naturally interpreted as the growth rate of GNP. Figure [7] shows that there might be a 
structural break in the data and indeed our testing procedure rejects the null hypothesis of 
no change point with p-value smaller than 0.001. The vertical line marks the point \ns\ 
at which the test process T n (s,t) is maximal. We used the test statistic for homoscedastic 
cases, next to the one for heteroscedastic cases, for these data, because the plot suggests 
that there might be some change in the variance. Both tests delivered the same value of the 
test statistic which is 1.392 (approximately). 

The same data were examined in Shao & Zhang (2010) with some kind of CUSUM test that 
is based on an self-normalization method. They tested for a possible change in the marginal 
variance, 75% quantile and 25% quantile of the observations, and the test for a change in 
the 75% quantile rejected the null hypothesis of no change point with p-value smaller than 
0.001. The tests for a change in the marginal variance and 25% quantile did not reject the 
null hypothesis of no change point. The p-values for these were greater than 0.1. 
Shumway & Stoffer (2006) also examined these data and used stationary time series models, 
such as AR(1) and MA(2), to fit them. Both model fits pass their diagnostic checking tests, 
but our results, as well as the results of Shao & Zhang (2010), indicate that the data might 
not be a stable process but contain a change point. 

Another dataset that we examined is the daily log-return of the S&P 500 index, a world 
known stock index that is quoted at the New York stock exchange, from July 1st 1998 to 
June 30th 2006. Figure [8] shows that there might be a structural break in these data as well, 
which is confirmed by our testing procedure with p-value smaller than 0.001. Again the 
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vertical line marks the point \ns\ at which the test process T n (s,t) is maximal. Like in the 
GNP example we used the test statistics for both cases (hetero- and homoscedastic) and both 
delivered nearly the same value of the test statistic which is 1.578 for the heteroscedastic 
and 1.575 for the homoscedastic case (approximately). 

We examined these data although it is known that for financial data often higher moments 
do not exist, because our simulation study with Student-t distributed innovations shows that 
the testing procedure works even if the moment assumption is not fulfilled. 
The S&P 500 data were also examined by Kirch & Tadjuidje Kamgaing (2012). They 
used a testing procedure based on cumulative sums of parametrically estimated residuals 
for nonlinear autoregressive models. Instead of log-returns they used transformed squared 
log-returns and they also reject the null hypothesis of no change point. They do not give a 
p- value but reject clearly at level 5%. 



7 Concluding remarks and outlook 

In this paper we transferred classical ideas of testing for change points in samples of indepen- 
dent observations to testing for change points in the innovation distribution in nonparamet- 
ric autoregressive models with conditional heteroscedasticity. To this end we considered the 
sequential empirical process of estimated innovations, proved an asymptotic expansion and 
weak convergence. We showed that the classical Kolmogorov-Smirnov test for a change point 
is asymptotically distribution-free in the new context and is not influenced asymptotically 
by the nonparametric estimation of the innovations. We proved consistency of the test under 
fixed alternatives and demonstrated the good performance in a simulation study. The proofs 
are based on empirical process theory for time series data and require the development of 
several technical auxiliary results. In particular we prove uniform rates for kernel estimators 
and their derivatives under nonstationarity assumptions. 

It is the topic of a future project to apply the theory developed here to test for serial 
independence of innovations or independence of the current innovation and past observations 
resp. covariates in nonparametric time series regression models. Moreover our aim is to model 
fc-dimensional joint innovation distributions in multivariate time series models. 



A Proofs: main results 



In this section we give the proofs for Theorems 3.1, 3.5, 4.3 and Corollaries 3.3, 3.6, 4.4 



whereas some auxiliary results (Lemmata B.l B.6) are stated and proved in section B. In 
some proofs standard arguments are given in condensed form for the sake of brevity. All 
details can be found in Selk (2011). 
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Figure 7: U.S. GNP quarterly growth rate 
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Figure 8: S&P 500 daily log-return 
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A.l Proofs for results under the null hypothesis 



Proof of Theorem |3.1[ From Lemma |B.4| and Lemma |B.5| it follows that under Hq 
uniformly with respect to s G [0, 1] and t £ R 

j l™ S J j l ns \ j 

-J2w nj (I{ej<t}-F{t)) = -J2^nj(I{sj<t}-F(t)) + R n (s,t) + o P (-=), 



[ns\ 



-J2(I {ej <t}- F(t)) + R n (s, t) + op(^), (A.i; 

.7 = 1 
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where for 



1 LH / / ~ 

1 \ / / m — m , 

Rn(s,t) = -22w nj lFl (X 

n .7=1 V V °" 



a 



a 



(A.2) 



it is straightforward to show by a first order Taylor expansion applying assumption (F) as 
well as Lemma IB. 21 that 



R n (s,t) = f(t)-Y*w nj (X. 

.7=1 v 



) _ 1 j+rj-(X J _ 1 ) -t ) +o P (- 



(A.3) 



uniformly with respect to s and t. Now inserting the definition of m from (]2.3|) we have 

: (^-i) 



1 \— v m — m 



n 



£ 

3=1 



"J 



(T 



V /V ^3=1 *. / -(^-i 



1 V- ^3=1 UJ nj JX y c, 



<KX/-i) 



ELi^(— ~ x 
i n i LnsJ i 

" 52 W "3 + P(-y=) 

8=1 7 = 1 V 



uniformly with respect to s and i, which follows from Lemma B.3 (i)-(iii). Now 

EH w nk [ns\ 



sup 

se[o,i] 



n 



n 



1 

= o P (i) 

3=1 



can be shown by Chebyshev's inequality and we obtain 

LH 

m-m, 

Wnn (X,-_i) = 

72 



1 ^ m-m, I sr^ [ns\ 1 , 

" ' u n n \/n 



n 



3=1 



By an application of 



a 



1 



<r 2 - a 2 (a- a) 2 



2a 2 



2a 2 



and Lemma B.2, for the second term arising from (A.3) we have 

[ns] 



|nsj 
3=1 

Now noting that 



w 



a 



a 



3-1 y 



1 



72 ^— ' 

3=1 



Wnj 2a 2 ^ + 



n 



a 2 (x) 



El 1 K(^)X 2 



EF=i * ( 



x-Xi 



(m(a;))' 



(A.4) 



(A.5) 



(A.6) 
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similarly to the derivation of (A. 5) with results similar to those in Lemma B.3 one obtains 

.7=1 x ' .7=1 



Finally from (A. 3), (A. 5) and (A.7) one has 



3=1 3=1 



and the assertion follows from this equality and (A.l). 



□ 



Proof of Corollary 3.3| . By an application of Theorem 3.1 it remains to show that 



\ 3=1 3=1 3=1 / 



J2( Z nM t, f(t), f(t)t) - E [Z nj ( S , t, f(t), f(t)t)\) 
3=1 



(A.8) 



converges weakly to the process (K F (s,t)), 



e[o,i],teR- 



Here we use the notations 



Z nj (s,t,u,v) = n 



where 



i(l{s j < t} l{l<s) + 



\ns\ 



n 



us I 

u£j + —^--v(ej - 1) ) , (s,t,u,v) G J 7 , 



0,sup/(i) 



,v e 



-sup |/(t)i|,sup \f(t)t\ 

teR tel. 



T= <^(s,t,u,v) : s G [0, l],t G R,u G 

is equipped with the semi-metric 

p((s,t,u,v),(s',t',u',v')) = \s-s'\ + \F(t) - F(t')\ + \u-u'\ + |u - 1/| 

and is totally bounded. By an application of Theorem 2.11.9 in van der Vaart & Wellner 
(1996) one can show weak convergence of the process 

' n 

^2(Z nj (s,t,u,v) - E [Z nj (s,t,u,v)\) 

to a centered Gaussian process. Details are omitted for the sake of brevity, but the arguments 
are similar to (but simpler than) those in the proofs in Neumeyer & Van Keilegom (2009) 
(see their proof of theorem 3 and the online supporting information). 

The assertion now follows by the continuous mapping theorem applied to the projections 
u = f(t) and v = f(t)t and by a straightforward calculation of the asymptotic covariance. 
□ 
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Proof of Theorem 3.5, For the process T n defined in (2.1) we have by a straightforward 
calculation 



T n (s,t) 



i=i i=L«sJ+i 
£ (/ ft < t} - F {t)) - E 

(1 ^ I I 1 71 \ 

- E ( J ^ - ^(*)) - ^r- E ^ ( J ^ *> - F W) + °*d) 
ft I V I V I 
3=1 3=1 ) 



WnjI {ij < t} 



uniformly with respect to s and t. Here the last equality follows from (A.4). Now inserting 



the expansion given in Theorem 3.1 we directly obtain 

T n {s,t) 

= ^Mf 1 _M)('j_g /fo < t} _^_ v jfest^+MD 

V 7 \ L J 3=1 L J i=Lnsj+l / 

uniformly with respect to s and t. The assertion follows from Remark 2 in Neumeyer & 
Van Keilegom (2009), which goes back to Theorem 3.1 by Csorgo, Horvath & Szyszkowicz 
(1997). □ 



Proof of Corollary 3.6 It directly follows from Theorem 3J3 and the continuous mapping 
theorem that 



sup 

se[o,i],« 



T n (s,t) sup \G(s,F(t))\ = sup \G(s,z) 

n ^°° se[o,i],*6R ae[o,i],«e[o,i] 



where the last equality holds by continuity of F. 



A. 2 Proofs for results under fixed alternatives 



□ 



Proof of Theorem 4.3 Assume that Hi is valid with a change point in [n9 \ . Analogously 



to the proof of Theorem 3.1 we have from Lemma B.4 that uniformly with respect to t G 

^ k=1 n nk (w*) - m) = - E w ^(m ^ o - m) 

1 L^oJ 

= - E w ^ ( J & ^ *> " F W) + R ^ *) + Ml) 



J R n (^ ,t) + o P (l), 
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where the last equality follows from Lemma B.6 Here, for R n as defined in (A. 2) one has 



by the mean value theorem applying assumption (F') and Lemma B.2 that 



sup \R n (9 , t)\ < sup 



m — m 



a 



sup \f(t)\ + sup 



a — a 



a 



sup\f(t)t\0 P (l) = o P (l). 



Thus the first assertion of the Theorem follows. The second assertion is shown analogously. 
□ 



Proof of Corollary 4.4 Assume that Hi is valid with a change point in |j^oJ- For T n 



defined in (2.1) we have 



sup 

teK 



T n (9o, t) 



n sup 



2_^k=l W nk 2^ik=\n9 \+l w nk 



> V n ( sup 



n 

v|n0 o J 



n 



2^fc=i Wnk l^k=\n9 \+l W nk 



n 



n 



F{t) - F(t) 



- op(1) 



from Theorem 4.3 Now under H 1 the right hand side converges to infinity in probability 
from which consistency follows. □ 



B Proofs: auxiliary results 



B.l Results 

For easy overview we first state the auxiliary results and then collect the proofs in the next 
subsection. 



Lemma B.l Under the assumptions of either Theorem 3.1 or Theorem \4-3\ we have that 



sup 

xei n 



-Ly K M( x ~ Xi - 1 )x*-E ±-Ykm( 

1=1 x ' 1=1 x 



X 



P (e n ) 



for u = 0,1,2, k = 0, 1, 2, where e n = c n 2 n 2 (logn) 2 . 

Lemma B.2 Under the assumptions of either Theorem 3.1 or Theorem \4-3\ we have that 

P ( (cn^n-^ilogn)^ + c 2 n ) q n q{ql) = o P (l), 



(i) sup 

X&In 



sup 

x£l n 



m(x) — m(x) 



a{x) 
a(x) — a(x) 



a[x) 



P ( [cn 2 n 2(logn)5 + c£) (q n q f n qlY) = o P (l), where q n , qf, q\ 



are defined in (M) and (X) resp. (X'). 



(ii) sup 

X&I n 



d ( m{x) — m(x) 
dx \ cr(x) 



o P {\), sup 

XGln 



d ( a(x) — a(x) 
dx \ a(x) 



op(1) 
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(iii) sup 

x,y£l n ,x^y 



sup 

x,yGl n ,x^y 



dx 


( rh(x)—m(x) 
\ ) 


8 f m{v)-m(y)\ 






\y 


— X 


s 



°p(l), 



a / »(g)-g(g)\ 9 / ojyy-^M 



dx \ ct ( x ') 7 9y \ "'(y) 



13/ - z| 



op(l) with S from (3.1) 



Lemma B.3 Under the assumptions of Theorem \3.1\ we have 

n L ns J 
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(ii) sup 
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Lemma B.4 Under the assumptions of either Theorem 3J_ or Theorem \4-3\ we have that 

[ns\ 



7=1 ^ * 



J{ £i <t} + F £ .(t) 



1 



uniformly with respect to s G [0, 1] and t Gl, where under H all F £j are equal to F. 



Lemma B.5 Under the assumptions of Theorem \3.1\ we have that 

\ns\ \ns\ 

j2w nj (n^<t}-nt)) = -^(/{^<t}-F(t))+o f . 

3=1 j=l ' 

uniformly with respect to s G [0, 1] and tel. 



Lemma B.6 Under the assumptions of Theorem \4-3\ we have that 

[nOoi 



i £ ^ y (/ {sj <t}- F(t)) = o P (l), i £ u; fti (/ < - ify)) = o P (l) 

j=l j=\n6 \+l 



uniformly with respect to t G 
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B.2 Proofs 



Proof of Lemma B.l. Let k G {0,1,2}. Throughout the proof we assume that \Xf\ < 
•nh logn for all % with b from assumptions (E), (E'). This is possible, because 

max|xf|>nhognj < J> (|X* | > »* logn) < ^^$>D^H = <>{1) 

by assumption (E) resp. (E'). 

Choose points Xj, j = 1,...,M* < M n = (b n - a n )/(e n c n ) (for I n = [a n ,b n ] from 
assumption (I) resp. (I')) such that I n is covered by intervals [xj — e n c n ,Xj + e n c n }. Now let 
Ki = KI hCjC \, where [ s bounded by K on the support [-C, C) of K for v e {0, 1, 2} 

(assumption (K)). Then by the mean value theorem we have 



< 



i=l x 



n 
1=1 



£7 — Xj_i 



X, — E 



X? -E 



x - Xj_x 



c «tr V 

«=1 x 



X? 



X 



n 
«=1 



£7 — 



X A 



for all x such that |x — Xj\ < e n c n . From this we obtain 



sup 



< max 

l<j<M* 



+ max e r 

i<i<M* 



- y a- m ( 



X — Xj_i 



X? -E 



Xj — Xi_\ 



X? -E 



i=l N 



X A 



Xf 



n 

— X> 

nc n ' 
1=1 



X — E 



n 

nc n ^— ' 

1=1 



%j -^Q— 1 



x 



+ 2e n max 

l<j<M* 



i=i x ' 
The last term on the right hand side can be bounded by 

—YK/ x ~ Xi - x 

ir — < 



2e n sup E 

X&I n 



nc r , 



1=1 



1 " r - 

2e n sup - Y / K i ( u ) E 

xe/„ n ^ J 



X 



X 



Xj_i = x — uc r , 



fxi_ x {x - uc n ) du 



< 2e n / Ki (u) du sup 



n 



1=1 



x fc 



Xi_i = x 



fx„(x) = 0(e n ) 



(B.l) 
(B.2) 
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by a change of variable and by assumption (K) and (Z) or (Z'), respectively, and model 
assumption (AR). In what follows we show that term (B.l) is of order Op(e n ); it can be 
shown analogously that (B.2) is of the same order. Define (for j fixed) 

v. = #M f x i- X i-i 



Xf-E 



K (u) I x j ~ x i-i 



X fc 



then the sequence (Y^j inherits the mixing conditions from (Xi)i due to 2.6.1 (ii) in Fan 
& Yao (2005). Further the variables are centered and bounded by 2Kn 1 ' b \ogn. We apply 
Liebscher's (1996) Theorem 2.1 to Y^i=i^i to obtain 



P I max 

1<3<M* 



< M n 4 exp 



nc, 



E A ' 



Xj — X-i 



X fc -E 



nc. 



E A " 



^7 1 



nc n M 2 e 2 



64 (1 + m n c n ) A(m n ) + ^MKm n n b e n logi 



n 



+ 4 — a(m r 



> Me n 
(B.3) 



for some M independent of j and for 



|_ne 2 (log M n ) 1 (logn) *J if n* 2c„ 2 (logn) 2 = 0(1) 
[n 1 ~ s c„ e n ( log M n ) ~ 1 (log n ) ~ 2 J otherwise . 



(B.4) 



Further, 



A(m n ) = 2(C + e n )X 2 sup max \\ E 



j=i-j* 



Xf 




1 



5+m, 

x sup — max N * / 

x 6 ;„ m„ o<S<n-™,- ^ W 



' i=s+l 



+ (2(C + e„)X) 2 sup 



1 



max 



x£J„ fn n 0<S<n-m n . 



= X 

5+m.n 

E B 



i=S+l 



IX 



X-i = ^ 



S+rrir, 



+ sup 



!X ' £ j n m^ 0<S<n-m n 



ax > i2 

n—m„ ' * 



max 



|i-il>i* 



I \r k \ v 



X-i — x,Xj-i — x 



with j* as in assumption (Z) resp. (Z'). In order to obtain B.3 from Liebscher's Theorem 
one has to show that 



max E 

0<T<n-l 



min(T+m n ,n) 

E « 

i=T+l 



2n 



< (m n c n + m 2 c 2 ) A(m n ). 
This can be done by some tedious calculations, which are omitted for the sake of brevity. 
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By assumption (Z) resp. (Z') we have A(m n ) = 0(1). To see this consider for example 
for k — 1 the term 



1 



S+m n 



sup — max y E 

xdJ n m n 0<S<n-m n 

S+m„ 



\XA 



Xi-i = x 



sup — max V E[\m(x) + a(x)e i \\fx i _ 1 {x) 

x£j n m n 0<S<n-m n ^ 



i=S+l 



1 



S+m„ 



< sup I (\m(x)\ +2|tr(a;)|) — max V fa^i 

xeJ n \ m n 0<S<n-m n ^ 



i=5+l 



(note that m n 1 = o(l)), and 



sup max 

x&J n m n 0<S<n-m, 



S+m n 

E 



=5+1 



xt 



Xi-i = x 



( 



1 

< sup — max 

x£j n n^n 0<S<n-m n 



m n + 



S+m n 

E 

i=S+l 



E 



Xt 



Xi-! = X 



\ 



X't 



Xi_i=x 



/ x ._ 1 (x)>l 



S+m n 



< 1 + sup — max E 

xeJn m n 0<S<n-m n .^-^ 



Xt 



Xi-i — x 



< 1 + sup (|m(x)| + |a(a;)|)^ — max 

x£j n \ m n 0<S<n-m n 



S+m 



=5+1 



0(1). 



The other terms in the definition of y4(m n ) are treated similarly. 

Inserting the definitions of M n , e n , m n and A(m n ) = 0(1) one obtains with a simply 



calculation that (B.3) is of order o(l) by the assumptions on the bandwidth c n and the 
mixing coefficient. This concludes the proof. □ 



Proof of Lemma B.2 , We only present the proofs for the assertions on rh, those on b 



follow by similar arguments using (A. 6). 



Let g k (x) = £? =1 K ( £= 5 =i ) X i for k e {°> x > 2 }- Then from Lemma 
follows that 



B.l 



it directly 



sup 

XGln 



QV 

dx v 



g k (x) - E [g k (x)} 



Or 



Op ( c n 2 "n 2 



(logra) 



(B.5) 
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Let further gk,i(x) = E [X* = x] fx^ix) for i = 1, . . . , n. Then 
E[g k (x)} - -^2gkA x ) 



sup 



sup 



sup 



dx h 



1=1 



-Xi-l = 1/ 



t=l 



1 1 



i=l 



1 n 

i(z - WC n )dw ^2 9k} fa) 

i=l 



Noting that by assumption (K), 

y (u) u^du = for n < v + 1, n ^ i/, y £T M (it) t/du = (-1) V! 
for z/ G {0, 1, 2}, from a Taylor expansion it further follows that 



sup 



d 1 ' 



(^2)lS 



1 n 

E [hfa)\ - 9k 

v. U i=l 

i=l 



(B.6) 



where the last equality follows from assumption (X) resp. (X') and (M) (note that g 0j ,, 
fx^, 9i,i = mfxi^, g2,i = (m 2 + a^fx^). Note further that 

'9ofa) ~ k^i=i9o,i( x ) 



inf 

XEl n 



9ofa) 


= inf 






lT, n i=i9oA x ) 





> 



j _ sn Pxei n \gofa) - \ Er=i flo,^) 



1 + O p (cn^n- 1 * (log n)k f n + c 2 n qi) (B.7) 



by (B.5), (B.6) and assumption (X) resp. (X'). 
Now for rh — gi/go from (2.3) we obtain 



sup \m{x) — m{x) 



sup 



sup 



9ifa) - J ELi 9i,i( x ) + m fa) (i EjLi 9o,i( x ) - 9ofa)) 



(gi( x ) - \ YTi=x 9ufa) + m fa) (J Er=i fltffa) - <?o(z))) 



go(g) 



«>,<(*) 



< 



inf 



a el,. 



sup 



^(^--y^i,^ 

n z — ' 



i=i 



sup|m(xj|sup 

XEln x£l„ 



n 



^29o,i( x )-9o(x) 



i=l 



P (c n ^n 2(logn)5g n g^ + c 2 n q n q^j 
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by (B.5), ( |B.6 ), (B.7) and assumption (X) resp. (X'). The first assertion in (i) now directly 
follows from (inf xg / n |cx(x)|) _1 = O (q°) (assumption (M)). 

Differentiating it is easy to see that from (B.5) and ( B.6[ ) it follows that 



sup 

X£l„ 



d u f m(x) — m{x) 



dx 1 



a(x) 



j=Q 1=0 ^ x&In 



Qi ( 1 " 



Op c„ f n 2(logn)2 + c ; )q n (qiqn) 



.r 



where the last equality only holds for z/ = 0, 1 by our bandwidth conditions. The first 
assertion of (ii) follows. 

Finally, note that by considering the y\ < c n and \x - y\ > c n we have 



sup 

x,yGl n ,x^y 



d ( m(x)-m(x) \ d ( rh(y) —m(y) 



dx V CT ( X ') 



dy V CT (3/) 



< 2 • sup 

xei n 



\y — x\ s 
d f m(x) — m(x) 
dx 



a(x 



cj + sup 

XGln 



d 2 ( m(x) — m(x) 



dx 2 



a(x) 



sup \x — y\ 

x,yel n ,0<\x-y\<c n 



1-5 



Op (( C ; 5 n-i(logn)^ + c 2 n ) q n {q f n ql) 2 ) c~ 5 + P ((c^n^ (log n)i + c 2 n ) q n (q^ n ) 3 ) 



= o P (l) 

and the first assertion of (iii) follows. 
Proof of Lemma B.3[ (i). For 



□ 



d n (x) = w n (x)- E £ i 

n i=l 



—K ' x ~ x '- 



) nCn 2^k=l ^ 



x-X k -i 



x-Xk-i 



1 y^" j£ ( x—X) 
nc n Z^fc=l I Cn 



we have 



/ d n {x)fx {x)dx = ~y~]Zi J w n { 



fx {x) dx + o P ( 



n 



o P { 



n 



where last equality follows by a calculation of the variance and Chebyshev's inequality. The 
first equality can be derived by using 



sup 

XGln 



n 

— t. k 



k=l 



X - X fc _i 



fx {x) 



Op [c n 2 n 2(logn)2 + c 2 n 



(B.8) 
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which follows from the proof of Lemma B.2 (note that with the notations used there, 

71 1 Y17=i 9oA x )) an d results from Lemma 



nc n) 1 J2l=i K((x-X k -i)/cn) = g Q (x 
(i) for the AR-model with m 



B.2 



0, a 



fx (x) 
= 1. 



Now assertion (i) is equivalent to 



sup 

ae[0,l] 



1 '-'"J 



n 



with centered functions oL 



proof of Lemma B.2 and by results from Lemma 



d n — J d n (x)fx {x) dx. By arguments similar to those in the 

for m 



B.2 



0, a 



1 we have P(d n G 



V n ) — > 1 for n — > oo for the function classes 



d: J„. — )• 



max I sup |a(x)|, sup \d (x)\ + sup : < 1, 



x£l„ 



x,y£l n ,x^y \X y\ 



sup \d(x)\ < z n , I d(y)f Xo (y)dy = 
xei„ 



— i i i 

with z n = c n 2 n~2 (log n) 2 (log n) 2 . Thus it remains to show that 



sup sup 

se[0,l] d£V„ 



[ns\ 



n 



(B.9) 



To this end let e n = n 2 (logn) l . It follows from Theorem 2.7.1 in van der Vaart & Wellner 
(1996) that M n2 < exp(2 I +«A'(2 + b n — an)^^ 5 ) balls of radius e n with respect to the 
supremum norm || ■ ||/„ on the interval I n are needed to cover T> n . Here the constant K only 
depends on 5. Let d±, . . . ,dM n2 denote centers of those balls. We may assume that those 
functions are elements of V n , too (see Pollard (1990), p. 10). Further let = s 1 < . . . < 
sm u1 = 1 segment [0, 1] in intervals of length < e n / z n such that M nl < j 2 -. Then it can be 
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shown that 



sup sup 

se[o,i] dev n 



< max max 

l<h<M nl l<k<M n2 



j V ns h\ 

- ^2 d k( x j-i, 



3=1 



+ max sup sup 

l<ft<M„i se [ Q) i] mit \s-s h \<e n /z n d£V n 



+ max max sup 

h l<k<M n2d&Vn mit \\d-d k \\ In <e r , 



1 n ( 

3=1 V 

, V ns h\ 

- (d(*i-0 - 4(^-0) 



< max max 

l<h<M n \ l<k<M n2 



[ns h \ 



- dk( x 3-h 



3=1 



+ o 



By an application of Liebscher's (1996) Theorem 2.1 to random variables Yi = c4(Xj_i)J { - < 
(for k, h fixed) one can show the existence of some constant M such that 



P I max max 

l<h<M nl l<k<M n2 



M nl M„ 2 
h=l k=l 



- Y d k( X 3-l) 
3= 

- Y d k( X 3~l) 



> Me, 



3=1 



>Me n 



n 2 M 2 e 2 



< M nl M n2 4exp 



o(D. 



n 



§An[ne n cl\z 2 + \nMe n [ne n cA\z r , 



+ 4 —a{\ne n cl\) 

[ne n ck\ 



Details are omitted for the sake of brevity. From this the rate Op(e n ) = op(n 1 ^ 2 ) follows 
for flB~9j ). 

(ii). We only describe the main steps of this proof. The random denominator can be replaced 
by the true density fx due to (B.8). Now define 

it \ i \ 1 X^ T ^f x ~ x i-i\ <r(Xi-i) — a(x) 
d n (x) = w n (x) y K 1 



1=1 



<r(x)fx (x) 



Then, J d n (x)fx (x) dx = op(n l l 2 ) can be shown by Chebyshev's inequality and for d n 
d n — j d n (x) f x Q {x) dx the assertion 



sup 

se[o,i] 



[ns\ 

~^2 d n{ X j-l] 



3=1 



Op{ 



n 
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is shown analogously to the proof of (i) . 

(iii). The assertion can be proved by the same methods. □ 



Proof of Lemma 



B.4 



Let d n \ = (m — m)/a and d n2 — a/a. Now the assumption of the 



lemma is equivalent to 



sup sup \H n (s,t,d nl ,d n2 ) - H n (s,t,0, 1)| = o P {—=) 
se[o,i] *eK V n 



with 

[nsj 

n 



\ns\ 

H n (s,t,d 1 ,d 2 ) = -^w nj {!{€, < t ■ d 2 {X j _ l ) + diiX^)} - FefidiiXj-!) + d 1 (X^ 1 ))) 



3=1 



For the proof we may assume that sup xgJn |d n i(x)| < 1, inf xg / n d n2 (x) > \ and \e$\ < y/n\ogn 
for all j — 1, .. . ,n because sup a . g/n |<i n i(x)| = op(l), sup xeIn \d n2 (x) — 1| = op(l) by Lemma 



B.2, and further 



1 n 1 

P[ max |e,-| > \fn\ozn ) < — E \e 2 l\e 2 > n(\ogn) 2 \ < = o(l) 

Vi<i<" 3 ) ~ n\ogn^ I 3 \ i )\ ~ loera 

3=1 



logra 



by the model assumption E[e 2 } = 1 for all j G Z. 

Note that /{^ < t • e? n2 (.Xj_i) + d n i(Xj-x)} - I{Sj < t} = for all t such that |t| > 
i/nlogn, for all j = 1, . . . ,n. Thus, by some simple estimations, 

sup sup \H n (s,t, d nl ,d n2 ) -H n (s,t, 0,1) | 

se[0,l] |t|>^/nlogn 

j L ns J 

< SUp SUp - ^ W nj ~ F e 3 (* * d^G^i-l) + d nl (X,_i)) + ^ (*) 

se[o,i] |t|>v^;iogn n . =1 



3=1 V V 77 3=1 

1 



n(\ogn) 2 ) ° \y/n 



where in the last line we have applied that for t > 1 



n n n , v 

-Ea-^W) = -E^>*) ^ ^-E^{^>^}] = oM (b.io) 

3=1 3=1 3=1 

by the model assumption E[e 2 } = 1 and analogously for t < —1, ^ $^?=i = ^(lA 2 )- 
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For the remainder of the proof we therefore only need to consider \t\ < ^/nlogn. Define 
sequences of function classes by 



d: I n -> 



r i u m i ,// \n \d'(x) — d'(y)\ ^ 1 
max{ sup | d[x) |, sup | a [x)\\ + sup : - < 1, 



x,yel n ,x^y W x \ 
sup \d{x)\ < z nl logn } for z nl = ( c n 2 n~^(\ogn)i + c 2 n ) q n q S n q a n 



2.n 



xei n 



d : I n -»■ 



max{sup |d(a;)|, sup |d'(a;)|} + sup J — ^ , fr^ < 2, 



a;6in 



x,yel„,x^y \U X\ 



inf > -, sup \d(x) - 1| < z n2 \ogn \ for z n2 = fc„ 2 n 2(logn)2 + 4) {q n qlq^ 2 
xei n I x& i n 



Then by Lemma B.2, P{d n \ G Pi, n ) — >■ 1, P{d n 2 G %,n) ~~ ^ 1 and it remains to show that 



sup \H n (s,t,di,d 2 ) - H n (s,t,0, l) \ = o p 

s£[0,l],|t|<V"logn, \-\/Tl 

To this end we apply covering arguments. Let e n = min{|, rT^ (logn) -1 }. The e„-covering 
numbers of both function classes with respect to the supremum norm on J„ can be bounded 
by M n < exp(c(2 + b n — a„)e n 1 ^ 1+5 ' ) ), see Theorem 2.7.1 by van der Vaart & Wellner (1996). 
Let dn, . . . , diM„ and d 2 i, ■ ■ ■ , d 2 M n , respectively, denote the corresponding centers of covering 
balls. Note that then sup x£ln < l + e n and d 2 i(x) G [| — e n , 2 + e n ] for all a; G I n . Let 

further the intervals [0, 1] and [— -y/nlogn, A/nlogn] be segmented by points = Si < . . . < 
$M ns = 1 and —-y/nlogn = t% < . . . < t Mnt = y/n\ogn, respectively, in segments of length 
< e n such that the number of points are bounded by M ns < l/e n and M nt < 2y/n\ogn/e n . 
Let || • ||/ B denote the supremum norm on I n . Then 



sup 

s£[0,l],\t\<^n\ogn, 



\HJs, t, di, d 2 ) - HJs, t, 0, 1) 



< meLx\H n (s h ,ti,di k ,d 2 i) - H n (s h ,U,0,l)\ 



+ max 

h 



sup 

s — s h \<e n ,\t\<*/rilogn, 



\H n (s, t, di, d 2 ) — H n (s h , t, dx,d 2 



+ max sup 

h,i,k,l |i-tj|<Sn. 

Il d i- d ifcll/ n <' f i'll [i 2- d 2;ll/ n << r i 



+ max 

h 



sup 

\s-s h \<€ n ,\t\<y/nlogn 



H n (sh, t, di, d 2 ) — H n (sh, ti, dik, d 2 i) 
H n (s h ,t,0,l)-H n (s,t,0,l)\ 



+ max sup \H n (s h ,ti,0, 1) - H n (s h ,t, 0, 1)|, 



h >* \t-ti\<e n 



(B.11) 
(B.12) 

(B.13) 
(B.14) 



33 



where the maximum is always with respect to h G {1, . . . , M ns }, i G {1, . . . , M nt }, k, I G 
{l,...,M n }. 



To further bound the term (B.12) first consider fixed h G {1, . . . , M ns } } i G {1, . . . , M nt } } 
k,l G {!,..., M n } such that £j > e n (the other case is treated analogously). Then 



sup 

|t-t»l<»n, 

ll d l- d lfc II J„ < E n .i ll d 2- d 2i II J n < f n 



< 



sup 

|t-ij<e„, 



ll d l- d lfcll/ n < £ "'ll d 2- d 2!ll/ n < E n 



+ sup 

|t-tj|<e n , n . 

ll d l-dlfell/ n < E n,l|d2-d2!ll/ n <'En 3 ~ 



H n (sh, t, di, d 2 ) — H n (sh, U, dik, d 2 i)\ 
1 n 

-^w nj \I{Ej < td 2 (X j _ 1 ) + d 1 (X J _ 1 )} - I{ £j < td 2l {X j _ 1 ) + d lk (Xj_ 

" 3=1 

-J^WnjlF^tduiX^x) + d lk {Xj^)) - F^td^X^) + d x (X,-_i))| 



< H n [l, ti + e n , d lk + e n , d 2 i + e n ) — H n (l, U — e m d\ k — e n , d 2 \ — e r 
2 n ( 

+ - 7J w n j F £j ((ti + e n )(c/ 2i (X i „i) + e n ) + + e n ) 



3=1 



- F e Mi - e n )(d 2 i{X j _ l ) - e n ) + d lfc (X,-_i) - e n ) 



H n [l, ti + e n , dife + e n , <i 2 ; + e n ) — H n (l, ti — e n , d% k — e n , d 2 i — e n ) + o ■ , . 



1 



where the last step follows from the mean value theorem, assumption (F) resp. (F') and 



e n = o(l/ y/n). Similarly for (B.14) we obtain 



max sup \H n (s h ,ti,0, 1) - H n (s h ,t, 0, 1)| 

h ' 1 \t-U\<e n 

< max.\H n (l,ti + e n ,Q,l) - H n (l,ti -e n ,0, 1)| +0 



(B.15) 



For (B.ll ) we have 



max sup \H n (s,t,di,d 2 ) - H n (s h ,t,di,d 2 ) 



I s — s fil< e ri.l t l<\/™ lo g' 

-1 n 



< max sup — 

h \s-s h \<e n nj^ 



I{-<s\-l\ 3 -<s h 

n n 



, 1\ 1 ( 1 

< max sup [ \s — s h \ -\ — < e n H — = o 



\s—Sh\<€ n 



n 



n 
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and analogously for (B.13) the same rate o(l/\/n). Altogether we have shown that 



sup \H n (s,t, di,d 2 ) - H n (s,t,0, 1)| 

se[0,l],|t|<v/n log n, 

< max \H n (s h ,U,d lk ,d 2 i) - H n (s h ,U,0, 1)| (B.16) 
+ max |if n (l, £j + e n , c?i fc + e n , g? 2 j + e«) — ^ — e n , ^ifc — <^2Z — e n )| 

i,k,l 

1 

+ max |if„(l,ti + e„,0, 1) - H n (l,U - e n , 0,1) | + o( 



To conclude the proof we exemplarily consider term (B.16); the other terms are treated 
analogously. For all rj > we have 

P I ^ramax \H n (s hj t ij di k ,d 2 i) - H n (s h ,ti,0, 1)| > rj ) 

\ h,i,k,l J 

< X^* (y^\ H n( s h,U,d lk ,d 2 i) - H n (s h ,U,0, 1)| >rj) 

h=l i=l fe=l /=1 



by an application of Theorem 1.6 by Freedman (1975). To see this define (for h, i, k, I fixed) 

Yj = w njl{^ < Sft} (I{ej < UM x J-i) + dikiXj^)} - I{ £j < U] 

-F e .{Udn{Xj- X ) + d lk {X^ x )) + F Ej {U)) , 

and note that \Yj\ < 2, E\Yj\X , . . . , X,-_ x ] = = |X , • • • , X,-_i] as well as 

n n 

^E [Y-\X , . . • < ^^(n) = Yl SU P l^-M^z) + d ik{x)) - F £j {t^\ 

3=1 3=1 XS/n 

< Cn(z n i log n + z n2 log n + e n ) 

by an application of the mean value theorem, the definition of T> l n , T> 2 n and assumption 
(F) resp. (F'), for some constant C independent of i, k, I. Hence, inserting the bounds on 



M n , M ns and M nt , (5.16) can be bounded by 



n \ogn 



2 exp ( log ( — ) + 2c(2 + b n - a n )e„ 



l+S 



nrf 

4r]y/n + 2Cn(z n i log n + z n2 log n + e r 



which follows from bandwidth condition (3.1 ). □ 
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Proof of Lemma B.5 Analogous to the proof of Lemma B.4 we may assume that \Ej\ < 
fn logn for all j — 1, . . . , n. Then 



sup 

s€[0,l], 
\t\ > y/n log n 



< sup 

»e[o,i], 



n 



J'=l 



n n 



+ sup 

»e[o,i], 

t< — ^/n log 



^ n 1 71 

< — > (1 — Ffi/n logn)) H — > F(— -y/nlogn) 



i=i 



O 



n(logn) 



where the second last equality follows from (B.10). 



Now let e n = n 2 (logn) 1 and let = s x < . . . < sm h1 = 1 be such that Sh — Sh-i < e n 
for all i = 2, . . . , M n> i and M n> i < l/e n . Further let — i/nlogn — t\ < . . . < tu n2 = y/nlogn 
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be such that U — < e n for all i = 2, . . . , M n ^ and M ni 2 < 2y/n\ogn/ e n . Then we have 



sup 

se[o,i], 

tE[— Vn^og n,^nlog ?i] 

[ns h ] 



[ns\ 

-X)M^i-i)-l) (/{ei < t} - F(t)) 



< max 

l<fc<Af njl , 
l<i<M 2 



-X;K(^i-i)-l)(/{ei<t}-^(*)) 

n 3=1 

- £ KPO-i) - 1) (Ifa <t}- Hi)) 



+ max sup 

l<h<A/„,i |s-s2,|<E n , 

tfr [— y/n log n,^/n log n] 



71 Z - 



+ max sup 

l<h<M n>1 , u_ t ]< z 



n ^ 

3=1 



(Wn(Xj-i) - 1) (I{Sj <t}~ F(t)) 

„" — 1 



< max 

l<h<M nA , 
l<i<M n o 



* l ns h\ 

- (MXj-i) - 1) (H^3 < u] - F{U)) 

n „•_ 1 



+o( 



+ max 

l<i<M„, 



- ^ K(X,-_i - l))(/{e i < U + e n } - F(t< + e n )) 
1 n 

- ^ K^.! - < U - e n } - F{U - e n )) 

71 3=1 
1 71 

+ 2 max - V (F(ti + e n ) - F(U - e n )) . 



+ max 

l<i<M„ 2 



(B.18) 



(B.19) 
(B.20) 



(B.21) 
(B.22) 
(B.23) 



To obtain the last inequality it can be shown analogously to the treatment of (B.ll) in the 



proof of Lemma B.4 that (B.18) is of order 0{e n ) = o(l/y/n). Further the bounding of 



(B.19) by the sum of (B.21), (B.22) and (B.23) is straightforward by using monotonicity of 



indicator and distribution functions. 



Now by the mean value theorem and assumption (F) it follows that (B.23) is of order 



0(e n ) = o(l/y/n). The remaining terms (B.20), (B.21), (B.22) are treated in the same way 
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and we will only consider (B.20) in what follows. For this term we have for each rj > that 





, i ns h\ 

H 


P Jn max 


\ l<h<M n lt 


\ l<*<Mn',2 




M n ,l M n>2 ( 


-E 




h=l i=l \ 



l ns h\ 

- K(^-i) - 1) (/{£,■ < t,} - F(U)) 



< 



< M„ A M n , ( Irxp 
= oil) 



K(^-i) - 1) (/fo < U] - Fit,)) 



nrj 



> r] 



n 



QAnuj n + %y/nr]\n^ (logn) 2 J J [n^ilogn) 2 J 



-a(\n^{\ogn) 2 J) 



(where a(-) denotes the a-mixing coefficient) by an application of Theorem 2.1 by Liebscher 
(1996) and the bandwidth conditions. Details are omitted for the sake of brevity, but note 
that 



1 



S+|nZ(logn)- 2 J 



UJ„ 



max 



; J2 E[{ Wn {X^)-lf] 

[ri2(\ogn) 2 J <5<n-Ln2(logn)-2J j=s+1 



< 



,00 / 1 

fx (x)dx+ / f Xo (x)dx = o 



by assumption (I). 



□ 



Proof of Lemma B.6 We only give arguments for the first statement. Similarly to the 



proof of Lemma B.5 one can show that 

[ndol 

n 



[n9 j 



\ Vni (I {Sj <t}~ F(t)) = \ U {Bj <t}~ F(t)) + P (1) 

lb lb 



(applying assumptions (I') and (X')). Then the assertion follows by standard arguments for 
the empirical distribution function of iid-data. □ 
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